86 research outputs found
Global-Scale Resource Survey and Performance Monitoring of Public OGC Web Map Services
One of the most widely-implemented service standards provided by the Open
Geospatial Consortium (OGC) to the user community is the Web Map Service (WMS).
WMS is widely employed globally, but there is limited knowledge of the global
distribution, adoption status or the service quality of these online WMS
resources. To fill this void, we investigated global WMSs resources and
performed distributed performance monitoring of these services. This paper
explicates a distributed monitoring framework that was used to monitor 46,296
WMSs continuously for over one year and a crawling method to discover these
WMSs. We analyzed server locations, provider types, themes, the spatiotemporal
coverage of map layers and the service versions for 41,703 valid WMSs.
Furthermore, we appraised the stability and performance of basic operations for
1210 selected WMSs (i.e., GetCapabilities and GetMap). We discuss the major
reasons for request errors and performance issues, as well as the relationship
between service response times and the spatiotemporal distribution of client
monitoring sites. This paper will help service providers, end users and
developers of standards to grasp the status of global WMS resources, as well as
to understand the adoption status of OGC standards. The conclusions drawn in
this paper can benefit geospatial resource discovery, service performance
evaluation and guide service performance improvements.Comment: 24 pages; 15 figure
A Robust and Efficient Boundary Point Detection Method by Measuring Local Direction Dispersion
Boundary points pose a significant challenge for machine learning tasks,
including classification, clustering, and dimensionality reduction. Due to the
similarity of features, boundary areas can result in mixed-up classes or
clusters, leading to a crowding problem in dimensionality reduction. To address
this challenge, numerous boundary point detection methods have been developed,
but they are insufficiently to accurately and efficiently identify the boundary
points in non-convex structures and high-dimensional manifolds. In this work,
we propose a robust and efficient method for detecting boundary points using
Local Direction Dispersion (LoDD). LoDD considers that internal points are
surrounded by neighboring points in all directions, while neighboring points of
a boundary point tend to be distributed only in a certain directional range.
LoDD adopts a density-independent K-Nearest Neighbors (KNN) method to determine
neighboring points, and defines a statistic-based metric using the eigenvalues
of the covariance matrix of KNN coordinates to measure the centrality of a
query point. We demonstrated the validity of LoDD on five synthetic datasets
(2-D and 3-D) and ten real-world benchmarks, and tested its clustering
performance by equipping with two typical clustering methods, K-means and Ncut.
Our results show that LoDD achieves promising and robust detection accuracy in
a time-efficient manner.Comment: 11 pages, 6 figures, 3 table
MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion
As the most typical graph clustering method, spectral clustering is popular
and attractive due to the remarkable performance, easy implementation, and
strong adaptability. Classical spectral clustering measures the edge weights of
graph using pairwise Euclidean-based metric, and solves the optimal graph
partition by relaxing the constraints of indicator matrix and performing
Laplacian decomposition. However, Euclidean-based similarity might cause skew
graph cuts when handling non-spherical data distributions, and the relaxation
strategy introduces information loss. Meanwhile, spectral clustering requires
specifying the number of clusters, which is hard to determine without enough
prior knowledge. In this work, we leverage the path-based similarity to enhance
intra-cluster associations, and propose MeanCut as the objective function and
greedily optimize it in degree descending order for a nondestructive graph
partition. This algorithm enables the identification of arbitrary shaped
clusters and is robust to noise. To reduce the computational complexity of
similarity calculation, we transform optimal path search into generating the
maximum spanning tree (MST), and develop a fast MST (FastMST) algorithm to
further improve its time-efficiency. Moreover, we define a density gradient
factor (DGF) for separating the weakly connected clusters. The validity of our
algorithm is demonstrated by testifying on real-world benchmarks and
application of face recognition. The source code of MeanCut is available at
https://github.com/ZPGuiGroupWhu/MeanCut-Clustering.Comment: 17 pages, 8 figures, 6 table
Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect
The characteristics of data like distribution and heterogeneity, become more
complex and counterintuitive as the dimensionality increases. This phenomenon
is known as curse of dimensionality, where common patterns and relationships
(e.g., internal and boundary pattern) that hold in low-dimensional space may be
invalid in higher-dimensional space. It leads to a decreasing performance for
the regression, classification or clustering models or algorithms. Curse of
dimensionality can be attributed to many causes. In this paper, we first
summarize five challenges associated with manipulating high-dimensional data,
and explains the potential causes for the failure of regression, classification
or clustering tasks. Subsequently, we delve into two major causes of the curse
of dimensionality, distance concentration and manifold effect, by performing
theoretical and empirical analyses. The results demonstrate that nearest
neighbor search (NNS) using three typical distance measurements, Minkowski
distance, Chebyshev distance, and cosine distance, becomes meaningless as the
dimensionality increases. Meanwhile, the data incorporates more redundant
features, and the variance contribution of principal component analysis (PCA)
is skewed towards a few dimensions. By interpreting the causes of the curse of
dimensionality, we can better understand the limitations of current models and
algorithms, and drive to improve the performance of data analysis and machine
learning tasks in high-dimensional space.Comment: 17 pages, 11 figure
Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding
As a pivotal approach in machine learning and data science, manifold learning
aims to uncover the intrinsic low-dimensional structure within complex
nonlinear manifolds in high-dimensional space. By exploiting the manifold
hypothesis, various techniques for nonlinear dimension reduction have been
developed to facilitate visualization, classification, clustering, and gaining
key insights. Although existing manifold learning methods have achieved
remarkable successes, they still suffer from extensive distortions incurred in
the global structure, which hinders the understanding of underlying patterns.
Scalability issues also limit their applicability for handling large-scale
data. Here, we propose a scalable manifold learning (scML) method that can
manipulate large-scale and high-dimensional data in an efficient manner. It
starts by seeking a set of landmarks to construct the low-dimensional skeleton
of the entire data, and then incorporates the non-landmarks into the learned
space based on the constrained locally linear embedding (CLLE). We empirically
validated the effectiveness of scML on synthetic datasets and real-world
benchmarks of different types, and applied it to analyze the single-cell
transcriptomics and detect anomalies in electrocardiogram (ECG) signals. scML
scales well with increasing data sizes and embedding dimensions, and exhibits
promising performance in preserving the global structure. The experiments
demonstrate notable robustness in embedding quality as the sample rate
decreases.Comment: 33 pages, 10 figure
Facile synthesis of graphene sheets intercalated by carbon spheres for high-performance supercapacitor electrodes
The composites consisting of graphene oxides (GOs) and carbon spheres (CSs), which were hydrothermally derived from the aqueous solution of glucose with average diameter of 200 nm, were mechanically mixed, and the GOs/CSs (GCSs) were thermally treated at high temperatures in the range of 700–900 °C. In the GCS composites, the CSs as spacers located between the GO sheets prevent the aggregation and restacking of graphene sheets. The GCS composites (GO/CS = 1) treated at 800 °C (GCS@800) have the high specific capacitances of 272.8 and 197.5 F g−1 in a three-electrode cell at the current density of 0.2 and 10 A g−1, respectively, in 6 M KOH aqueous solution, and demonstrated high rate capability and good cycling stability. The excellent electrochemical performance of the GCS@800 electrode is attributed to its structure with hierarchical porous structures including overwhelming micropores and a few of macropores. This work provides an effective and simple technique by integrating CSs and graphene sheets into composite structures for high-performance energy storage devices
- …